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Retrieval  from  Video  and  Pictorial  Databases  Employing 

Similarity  and  Motion 

Jezekiel  Ben-Arie,  A.  Prasad  Sistla  and  Clement  Yu 
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3.  Report  of  Inventions  None 

4.  Scientific  Progress  and  Accomplishments 
Introduction 

This  project  proposes  research  towards  the  development  of  a  system  for  similarity 
based  retrieval  from  video  and  pictorial  databases.  The  salient  features  of  the  proposed 
system  are  : 

•  An  expressive  query  language,  called  Hierarchical  Temporal  Logic  (HTL)  [14],  for 
expressing  spatio-temporal  queries  on  video  databases. 

•  A  modular  similarity  based  retrieval  system  [1 1]  for  the  temporal  part  of  the  query 
which  can  be  used  on  top  of  any  suitable  picture  retrieval  system  [9]  [14]. 
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•  An  image  understanding  system  for  extracting  the  objects  and  activities  [1]  [2]  [3] 
[4]  5]  [6]  [12]  in  the  videos  in  the  database  which  uses  local  and  global  color  char¬ 
acteristics,  motion  signatures,  shape  characteristics  and  texture. 

•  An  activity  recognition  module  that  recognizes  human  actions  will  be  used  to  inter¬ 
pret  video  shots  with  respect  to  temporal  events. 

•  A  dictionary  based  system  for  translating  the  parts  of  the  query  into  feature  vectors 
[4]  [5]  6]  [12]  [15]  which  are  searched  in  a  space  consisting  of  the  feature  vectors 
corresponding  to  the  videos  present  in  the  database. 

•  An  incremental  learning  module  which  expands  the  dictionary  from  examples. 


HTL  uses  the  classical  temporal  operators  to  specify  temporal  properties  of  videos 
(i.e.  the  sequencing  of  video  segments),  and  it  employs  level  modal  operators  to  specify 
such  properties  at  different  levels  in  the  video  hierarchy  [7]  [9].  At  the  atomic  predicate 
level,  the  language  allows  specification  of  properties  on  the  contents  of  a  single  video 
segment  including  objects  and  motions  (activities).  For  a  given  user  query,  each  atomic 
predicate  in  the  query  is  translated  into  feature  vectors  [4]  [5]  [6]  (or  hyper  regions  in  fea¬ 
ture  space)  using  a  dictionary  based  scheme;  these  feature  vectors  are  then  searched  using 
multi-dimensional  indexing  in  the  database  consisting  of  the  feature  vectors  of  all  the 
video  segments  in  the  video  database;  the  result  of  this  is  a  similarity  list  containing  en¬ 
tries  where  each  entry  contains  the  id  of  a  relevant  video  segment  and  a  similarity  value 
denoting  how  closely  it  satisfies  the  atomic  predicate.  Such  similarity  lists  for  the  atomic 
predicates  can  also  be  obtained  by  a  using  a  picture  retrieval  system.  The  similarity  lists 
for  the  atomic  predicates  are  combined  together  to  obtain  a  similarity  list  for  the  main 
HTL  query.The  unique  aspect  of  our  feature  vector  extraction  is  the  combination  of  the 
static  generic  object  characterization  in  pictures  [4]  [5]  [12]  [15]  and  the  motion/activity 
recognition  [11]  in  videos  using  a  unified  approach  based  on  multidimensional  indexing 
both  in  the  image  domain  and  in  the  action  domain. 

The  project  proposes  extensions  to  the  HTL  query  language,  by  incorporating  addi¬ 
tional  temporal  operators  in  order  to  make  it  more  expressive.  A  graphical  user-interface 
for  specifying  HTL  queries  will  be  developed.  To  improve  the  accuracy  of  the  similarity 
based  retrieval,  alternate  similarity  functions  will  be  investigated.  Techniques  such  as 
temporal  indexing  (i.e.  indices  on  the  time  dimension)  will  be  investigated  in  order  to  en¬ 
hance  the  performance  of  the  retrieval  system.  Also,  extensions  to  the  currently  imple¬ 
mented  picture  retrieval  system  are  proposed.  The  project  also  proposes  development  of 
new  methods  for  segmentation  of  a  video  stream  into  shots  using  edge  detection  tech¬ 
nique  based  on  feature  vectors  of  the  frames.  Also  proposed  are  novel  image  segmenta¬ 
tion  and  methods  for  person  detection  based  on  novel  approach  of  model  based  segmenta¬ 
tion  and  color  characteristics.  The  research  proposes  generic  recognition  of  objects  and 
activities  employing  a  flexible  dictionary  approach  that  translates  atomic  predicates  into 
feature  vectors  which  are  then  matched  with  corresponding  features  of  video 
frames/shots.  The  dictionary  can  represent  also  generic  inanimate  or  animate  objects  and 
activities  by  hyper-regions  in  the  feature  vector  space.  It  is  proposed  to  develop  learning 
techniques  which  incrementally  expand  the  dictionary  with  additional  entries  and  gener¬ 
alizes  existing  entries. 
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Goals,  Objectives,  and  Targeted  Activities 

•  Extensions  To  The  Hierarchical  Temporal  Logic  (HTL)  Language:  The  HTL  lan¬ 
guage  will  be  extended  by  introducing  new  temporal  operators.  The  temporal  part 
of  the  extended  language  will  be  at  least  as  expressive  as  regular  expressions. 

•  Extensions  To  The  Similarity  List  Generator:  New  similarity  functions  correspond¬ 
ing  to  the  different  temporal  operators  will  be  introduced. 

•  User  Interface:  A  user  interface  for  specifying  the  atomic  predicates  in  the  HTL 
language  will  be  developed. 

•  Extracting  Features  From  Video  And  Segmentation: 

•  Segmentation  of  color  images  based  on  local  and  global  color  and  shape 
characteristics  [1]  [3]  [4]  [5]  [6]. 

•  Using  Karhunen-Loeve/Principal-Component  Expansion  for  model  based 
segmentation  of  shapes  and  persons  [2]  [4]  [12]. 

•  Action  recognition  by  modeling  human  body  junctions  and  recognizing  their 
motions  for  generic  actions  [11]. 

•  Matching  Video  With  Queries  -  Generic  Objects  And  Activities: 

•  Development  of  combined  segmentation  methods  for  objects  that  will  pro¬ 
vide  a  robust  background  for  reliable  feature  extraction  and  for  spectral  sig¬ 
nature  extraction  for  robust  recognition. 

•  Investigation  of  neural  networks  [3]  [13]  /3-dimensional  frequency  domain 
representation  for  extracting  3D  characteristics  of  surfaces  from  monocular 
images  for  the  purpose  of  pose-invariant  3D  object  recognition  [6]  [16]  [17]. 

Significant  Theoretical  Advances 

1.  Discovered  a  novel  Volumetric  Frequency  Representation  [6]  [16]  [17]  that  encap¬ 
sulates  both  the  3D  structure  of  objects  along  with  a  continuum  of  their  views. 
This  establishes  a  new  approach  in  computer  vision  which  can  be  for  many  appli¬ 
cations  such  as  pose-invariant  object/face  recognition,shape  reconstruction  from 
single  images,  3-D  motion  recovery. 

2.  Discovered  and  developed  a  novel  method  for  segmentation  of  objects  based  on 
their  generic  models  [4]  [12].  This  method  is  very  useful  in  detection  of  objects 
and  persons  in  cluttered  scenes. 


Significant  Experimental  Advances 

1.  The  Similarity  list  generator  for  a  subclass  of  HTL  queries,  called  conjunctive 
queries,  has  been  implemented.  Preliminary  results  are  encouraging. 

2.  We  have  succeeded  in  segmentation  of  human  faces  using  color  and  frequency 
characteristics.  We  have  completed  a  model-based  segmentation  scheme  that  de¬ 
tects  man-made  objects  in  cluttered  scenes  [4]  [12]. 
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3.  A  robust  scheme  for  head  detection  is  now  in  development  with  successful  results 
for  various  poses  of  human  head  [3]  [6].  Also,  a  robust  scheme  for  tracking  hu¬ 
man  body  parts  for  the  purpose  of  activity  recognition  is  under  construction. 

5.  Technology  Transfer 

1 .  We  provide  a  web  based  interface  to  our  preliminary  video  database.  The  web  de¬ 
mo  is  available  at  http://arik.eecs.uic.edu/cgi-bin/vdsearch.cgi  The  database  al¬ 
lows  the  search  of  video  files  with  content  and  action  keywords.  The  interface 
provides  the  capability  for  specifying  objects  and  associated  actions  in  a  temporal 
sequence.  The  temporal  depth  provided  by  the  menu  is  3.  Additional  simultaneous 
specifications  of  objects  and  actions  can  also.  Currently  the  simultaneous  number 
of  actions  that  can  be  specified  is  also  three.  A  Java  based  GUI  is  under  develop¬ 
ment,  which  will  permit  unrestricted  models  of  hierarchical  and  temporal  specifi¬ 
cations  of  video  content. 

2.  In  related  research.  Prof.  Yu  has  also  developed  a  metasearch  engine  to  retrieve 
text  documents  from  multiple  databases.  A  web  demo  is  available  at 
http://yu.eecs.uic.edu:8080/demo/  New  algorithms  for  retrieving  the  N  most  simi¬ 
lar  documents  with  respect  to  a  text  query  from  multiple  databases  have  been  de¬ 
veloped,  giving  the  capability  for  optimally  searching  distributed  heterogeneous 
databases.  Different  databases  are  optimally  ranked  during  the  search  depending 
on  statistics  generated  by  their  content.  It  is  shown  that  the  documents  retrieved 
from  a  small  subset  of  databases  by  this  algorithm  with  respect  to  a  query  are  es¬ 
sentially  the  same  as  those  as  if  all  documents  are  placed  in  a  single  site. 

3.  Human  Resources  and  Training  of  Personnel:  Minlin  Deng:  Ph.  D.  student,  Tao 
Hu,  Ph.D.  student,  Zhiqian  Wang:  Ph.D.  student,  Dibyendu  Nandy:  Ph.D.  stu¬ 
dent,  Xiao  Du,  Ph.D.  student,  R.  Venkatasubramanian:  M.  S.  student. 

4.  Education  and  curriculum  development  at  all  levels.  The  PI,  Prof.  Ben-Arie 
teaches  graduate  and  undergraduate  courses  on  image  understanding,  image  anal¬ 
ysis  and  image  processing.  His  research  has  influenced  and  contributed  to  the  con¬ 
tents  of  these  courses.  The  Co-PI,  Prof.  Yu  teaches  graduate  and  undergraduate 
courses  in  Database  systems  and  other  areas  of  Computer  Science.  His  research  in 
heterogeneous  and  distributed  databases  has  influeneed  the  material  of  these 
courses  and  introduced  novel  database  techniques  to  new  students.  The  PI,  Prof. 
Sistla  also  teaches  graduate  and  undergraduate  courses  in  Databases  and  other  ar¬ 
eas  of  Computer  Science.  His  research  in  HTL  and  temporal  querying  mecha¬ 
nisms  has  also  been  reflected  in  the  courses  taught  by  him. 
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